Search CORE

144 research outputs found

Improving Entity Linking by Modeling Latent Entity Type Information

Author: Chen Shuang
Jiang Feng
Lin Chin-Yew
Wang Jinpeng
Publication venue
Publication date: 06/01/2020
Field of study

Existing state of the art neural entity linking models employ attention-based bag-of-words context model and pre-trained entity embeddings bootstrapped from word embeddings to assess topic level context compatibility. However, the latent entity type information in the immediate context of the mention is neglected, which causes the models often link mentions to incorrect entities with incorrect type. To tackle this problem, we propose to inject latent entity type information into the entity embeddings based on pre-trained BERT. In addition, we integrate a BERT-based entity similarity score into the local context model of a state-of-the-art model to better capture latent entity type information. Our model significantly outperforms the state-of-the-art entity linking models on standard benchmark (AIDA-CoNLL). Detailed experiment analysis demonstrates that our model corrects most of the type errors produced by the direct baseline.Comment: Accepted by AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

A new result on observer-based sliding mode control design for a class of uncertain Ito^ stochastic delay systems

Author: Gao Cunchen
Liu Zhen
Yu Jinpeng
Zhao Lin
Zhu Quanmin
Publication venue: 'Elsevier BV'
Publication date: 01/12/2017
Field of study

© 2017 The Franklin Institute This paper develops a new observer-based sliding mode control (SMC) scheme for a general class of Ito^ stochastic delay systems (SDS). The key merit of the presented scheme lies in its simplicity and integrity in design process of the traditional sliding mode observer (SMO) strategy, i.e., the state observer and sliding surface design as well as the associated sliding mode controller synthesis. For guaranteeing to use the scheme, a new LMIs-based criterion is established to ensure the exponential stability of the underlying sliding mode dynamics (SMDs) in mean-square sense with H∞ performance. A bench test example is provided to numerically demonstrate the efficacy of the scheme and illustrate the application procedure for potential readers/users with interest in their ad hoc applications and methodology expansion

UWE Bristol Research Repository

TextPainter: Multimodal Text Image Generation with Visual-harmony and Text-comprehension for Poster Design

Author: Gao Yifan
Ge Tiezheng
Jiang Yuning
Lin Jinpeng
Liu Chuanbin
Xie Hongtao
Zhou Min
Publication venue
Publication date: 12/08/2023
Field of study

Text design is one of the most critical procedures in poster design, as it relies heavily on the creativity and expertise of humans to design text images considering the visual harmony and text-semantic. This study introduces TextPainter, a novel multimodal approach that leverages contextual visual information and corresponding text semantics to generate text images. Specifically, TextPainter takes the global-local background image as a hint of style and guides the text image generation with visual harmony. Furthermore, we leverage the language model and introduce a text comprehension module to achieve both sentence-level and word-level style variations. Besides, we construct the PosterT80K dataset, consisting of about 80K posters annotated with sentence-level bounding boxes and text contents. We hope this dataset will pave the way for further research on multimodal text image generation. Extensive quantitative and qualitative experiments demonstrate that TextPainter can generate visually-and-semantically-harmonious text images for posters.Comment: Accepted to ACM MM 2023. Dataset Link: https://tianchi.aliyun.com/dataset/16003

arXiv.org e-Print Archive

Too Large; Data Reduction for Vision-Language Pre-Training

Author: Lei Stan Weixian
Lin Kevin Qinghong
Shou Mike Zheng
Wang Alex Jinpeng
Zhang David Junhao
Publication venue
Publication date: 31/05/2023
Field of study

This paper examines the problems of severe image-text misalignment and high redundancy in the widely-used large-scale Vision-Language Pre-Training (VLP) datasets. To address these issues, we propose an efficient and straightforward Vision-Language learning algorithm called TL;DR, which aims to compress the existing large VLP data into a small, high-quality set. Our approach consists of two major steps. First, a codebook-based encoder-decoder captioner is developed to select representative samples. Second, a new caption is generated to complement the original captions for selected samples, mitigating the text-image misalignment problem while maintaining uniqueness. As the result, TL;DR enables us to reduce the large dataset into a small set of high-quality data, which can serve as an alternative pre-training dataset. This algorithm significantly speeds up the time-consuming pretraining process. Specifically, TL;DR can compress the mainstream VLP datasets at a high ratio, e.g., reduce well-cleaned CC3M dataset from 2.82M to 0.67M (

\sim

24\%) and noisy YFCC15M from 15M to 2.5M (

\sim

16.7\%). Extensive experiments with three popular VLP models over seven downstream tasks show that VLP model trained on the compressed dataset provided by TL;DR can perform similar or even better results compared with training on the full-scale dataset. The code will be made available at \url{https://github.com/showlab/data-centric.vlp}.Comment: Work in progress. Code: https://github.com/showlab/data-centric.vl

arXiv.org e-Print Archive

Genome-Wide DNA Methylation Profiling in Human Breast Tissue by Illumina TruSeq Methyl Capture EPIC Sequencing and Infinium MethylationEPIC Beadchip Microarray

Author: Castle James
He Chunyan
Lin Nan
Liu Jinpeng
Liu Yunlong
Shendre Aditi
Wan Jun
Wang Chi
Publication venue: UKnowledge
Publication date: 13/10/2020
Field of study

A newly-developed platform, the Illumina TruSeq Methyl Capture EPIC library prep (TruSeq EPIC), builds on the content of the Infinium MethylationEPIC Beadchip Microarray (EPIC-array) and leverages the power of next-generation sequencing for targeted bisulphite sequencing. We empirically examined the performance of TruSeq EPIC and EPIC-array in assessing genome-wide DNA methylation in breast tissue samples. TruSeq EPIC provided data with a much higher density in the regions when compared to EPIC-array (~2.74 million CpGs with at least 10X coverage vs ~752 K CpGs, respectively). Approximately 398 K CpGs were common and measured across the two platforms in every sample. Overall, there was high concordance in methylation levels between the two platforms (Pearson correlation r = 0.98, P \u3c 0.0001). However, we observed that TruSeq EPIC measurements provided a wider dynamic range and likely a higher quantitative sensitivity for CpGs that were either hypo- or hyper-methylated (β close to 0 or 1, respectively). In addition, when comparing different breast tissue types TruSeq EPIC identified more differentially methylated CpGs than EPIC-array, not only out of additional sites interrogated by TruSeq EPIC alone, but also out of common sites interrogated by both platforms. Our results suggest that both platforms show high reproducibility and reliability in genome-wide DNA methylation profiling, while TruSeq EPIC had a significant improvement over EPIC-array regarding genomic resolution and coverage. The wider dynamic range and likely higher precision of the estimates by the TruSeq EPIC may lead to the identification of novel differentially methylated markers that are associated with disease risk

IUPUIScholarWorks

Directory of Open Access Journals

PubMed Central

University of Kentucky